图神经网络(GNN)在图形深学习域中受到了很多关注。但是,从经验和理论上,最近的研究表明,深度GNN遭受了过度拟合和过度平滑的问题。通常的解决方案不能解决深度GNN的大量运行时,或者在同一特征空间中限制了图形卷积。我们提出了自适应图扩散网络(AGDN),该网络在具有中等复杂性和运行时的不同特征空间中执行多层广义图扩散。标准图扩散方法将过渡矩阵的大且密集的功率与预定义的加权系数结合在一起。取而代之的是,AGDN将较小的多跳节点表示与可学习的加权系数结合在一起。我们提出了两种可扩展的加权系数机制,以捕获多跳信息:趋于关注(HA)和霍普·沃斯卷积(HC)。我们评估了具有半监督节点分类和链接预测任务的多样性,挑战开放图基准(OGB)数据集的AGDN。直到提交日期(2022年8月26日),AGDNS在OGBN-ARXIV,OGBN-蛋白质和OGBL-DDI数据集中实现了TOP-1性能,并且在OGBL-Citater2数据集中获得了TOP-3性能。在类似的Tesla V100 GPU卡上,AGDNS优于可逆的GNNS(REVGNNS),其复杂性为13%,REVGNN在OGBN-Proteins数据集上的培训时间为1%。 AGDN还可以通过36%的训练来实现与密封的可比性能,而OGBL-Citation2数据集的密封量为0.2%的推理运行时。
translated by 谷歌翻译
机器人超声(US)成像已被视为克服美国自由手检查的局限性,即操作员互操作机构的局限性。 \修订{然而,机器人美国系统在扫描过程中无法对主体运动做出反应,这限制了他们的临床接受。}关于人类超声检查员,他们经常通过重新定位探针甚至重新启动摄取,尤其是因为扫描而对患者的运动做出反应。具有较长结构等肢体动脉的解剖学。为了实现这一特征,我们提出了一个基于视觉的系统来监视受试者的运动并自动更新扫描轨迹,从而无缝获得目标解剖结构的完整3D图像。使用RGB图像中的分段对象掩码开发运动监视模块。一旦受试者移动,机器人将通过使用迭代最接近点算法在移动前后获得的对象的表面点云来停止并重新计算合适的轨迹。之后,为了确保重新定位US探针后的最佳接触条件,使用基于置信的微调过程来避免探针和接触表面之间的潜在间隙。最后,整个系统在具有不均匀表面的人类臂幻象上进行了验证,而对象分割网络也在志愿者上得到验证。结果表明,提出的系统可以对对象运动做出反应,并可靠地提供准确的3D图像。
translated by 谷歌翻译
在接受高质量的地面真相(如LiDAR数据)培训时,监督的学习深度估计方法可以实现良好的性能。但是,LIDAR只能生成稀疏的3D地图,从而导致信息丢失。每个像素获得高质量的地面深度数据很难获取。为了克服这一限制,我们提出了一种新颖的方法,将有前途的平面和视差几何管道与深度信息与U-NET监督学习网络相结合的结构信息结合在一起,与现有的基于流行的学习方法相比,这会导致定量和定性的改进。特别是,该模型在两个大规模且具有挑战性的数据集上进行了评估:Kitti Vision Benchmark和CityScapes数据集,并在相对错误方面取得了最佳性能。与纯深度监督模型相比,我们的模型在薄物体和边缘的深度预测上具有令人印象深刻的性能,并且与结构预测基线相比,我们的模型的性能更加强大。
translated by 谷歌翻译
这项工作旨在探索无卷积的基本分类器,该分类器可用于扩大常规合奏分类器的变化。具体而言,我们建议视觉变压器作为基本分类器,以与CNN结合使用Kaggle亲属识别中的独特集合解决方案。在本文中,我们通过在现有CNN模型之上实施和优化视觉变压器模型的变体来验证我们的想法。组合模型比仅基于CNN变体的常规集合分类器获得更好的分数。我们证明,高度优化的CNN合奏在Kaggle讨论板上公开可用,可以通过与Vision Transformer模型的变体简单地合奏,从而轻松地获得ROC得分的显着提升,这是由于低相关性而引起的。
translated by 谷歌翻译
立体声匹配是计算机愿景中的一个重要任务,这些任务是几十年来引起了巨大的研究。虽然在差距准确度,密度和数据大小方面,公共立体声数据集难以满足模型的要求。在本文中,我们的目标是解决数据集和模型之间的问题,并提出了一个具有高精度差异地面真理的大规模立体声数据集,名为Plantstereo。我们使用了半自动方式来构造数据集:在相机校准和图像配准后,可以从深度图像获得高精度视差图像。总共有812个图像对覆盖着多种植物套装:菠菜,番茄,胡椒和南瓜。我们首先在四种不同立体声匹配方法中评估了我们的Plandstereo数据集。不同模型和植物的广泛实验表明,与整数精度的基础事实相比,Plantstereo提供的高精度差异图像可以显着提高深度学习模型的培训效果。本文提供了一种可行和可靠的方法来实现植物表面密集的重建。 PlantSereo数据集和相对代码可用于:https://www.github.com/wangqingyu985/plantstereo
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译